ranking rule
Ordinal Regression by Extended Binary Classification
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework.
Aggregation of Pareto optimal models
Bajgiran, Hamed Hamze, Owhadi, Houman
In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preserve Pareto efficiency. This paper presents an answer in four logical steps: (1) A rational aggregation rule should preserve Pareto efficiency (2) Due to the complete class theorem, Pareto optimal models must be Bayesian, i.e., they minimize a risk where the true state of nature is averaged with respect to some prior. Therefore each Pareto optimal model can be associated with a prior, and Pareto efficiency can be maintained by aggregating Pareto optimal models through their priors. (3) A prior can be interpreted as a preference ranking over models: prior $\pi$ prefers model A over model B if the average risk of A is lower than the average risk of B. (4) A rational/consistent aggregation rule should preserve this preference ranking: If both priors $\pi$ and $\pi'$ prefer model A over model B, then the prior obtained by aggregating $\pi$ and $\pi'$ must also prefer A over B. Under these four steps, we show that all rational/consistent aggregation rules are as follows: Give each individual Pareto optimal model a weight, introduce a weak order/ranking over the set of Pareto optimal models, aggregate a finite set of models S as the model associated with the prior obtained as the weighted average of the priors of the highest-ranked models in S. This result shows that all rational/consistent aggregation rules must follow a generalization of hierarchical Bayesian modeling. Following our main result, we present applications to Kernel smoothing, time-depreciating models, and voting mechanisms.
A Multiclass Classification Approach to Label Ranking
Clรฉmenรงon, Stephan, Vogel, Robin
In multiclass classification, the goal is to learn how to predict a random label $Y$, valued in $\mathcal{Y}=\{1,\; \ldots,\; K \}$ with $K\geq 3$, based upon observing a r.v. $X$, taking its values in $\mathbb{R}^q$ with $q\geq 1$ say, by means of a classification rule $g:\mathbb{R}^q\to \mathcal{Y}$ with minimum probability of error $\mathbb{P}\{Y\neq g(X) \}$. However, in a wide variety of situations, the task targeted may be more ambitious, consisting in sorting all the possible label values $y$ that may be assigned to $X$ by decreasing order of the posterior probability $\eta_y(X)=\mathbb{P}\{Y=y \mid X \}$. This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation (regression) and referred to as label ranking here. We highlight the fact that it can be viewed as a specific variant of ranking median regression (RMR), where, rather than observing a random permutation $\Sigma$ assigned to the input vector $X$ and drawn from a Bradley-Terry-Luce-Plackett model with conditional preference vector $(\eta_1(X),\; \ldots,\; \eta_K(X))$, the sole information available for training a label ranking rule is the label $Y$ ranked on top, namely $\Sigma^{-1}(1)$. Inspired by recent results in RMR, we prove that under appropriate noise conditions, the One-Versus-One (OVO) approach to multiclassification yields, as a by-product, an optimal ranking of the labels with overwhelming probability. Beyond theoretical guarantees, the relevance of the approach to label ranking promoted in this article is supported by experimental results.
Ranking Median Regression: Learning to Order through Local Consensus
Clรฉmenรงon, Stephan, Korba, Anna, Sibony, Eric
This article is devoted to the problem of predicting the value taken by a random permutation $\Sigma$, describing the preferences of an individual over a set of numbered items $\{1,\; \ldots,\; n\}$ say, based on the observation of an input/explanatory r.v. $X$ e.g. characteristics of the individual), when error is measured by the Kendall $\tau$ distance. In the probabilistic formulation of the 'Learning to Order' problem we propose, which extends the framework for statistical Kemeny ranking aggregation developped in \citet{CKS17}, this boils down to recovering conditional Kemeny medians of $\Sigma$ given $X$ from i.i.d. training examples $(X_1, \Sigma_1),\; \ldots,\; (X_N, \Sigma_N)$. For this reason, this statistical learning problem is referred to as \textit{ranking median regression} here. Our contribution is twofold. We first propose a probabilistic theory of ranking median regression: the set of optimal elements is characterized, the performance of empirical risk minimizers is investigated in this context and situations where fast learning rates can be achieved are also exhibited. Next we introduce the concept of local consensus/median, in order to derive efficient methods for ranking median regression. The major advantage of this local learning approach lies in its close connection with the widely studied Kemeny aggregation problem. From an algorithmic perspective, this permits to build predictive rules for ranking median regression by implementing efficient techniques for (approximate) Kemeny median computations at a local level in a tractable manner. In particular, versions of $k$-nearest neighbor and tree-based methods, tailored to ranking median regression, are investigated. Accuracy of piecewise constant ranking median regression rules is studied under a specific smoothness assumption for $\Sigma$'s conditional distribution given $X$.
A ranking approach to global optimization
Malherbe, Cรฉdric, Vayatis, Nicolas
We consider the problem of maximizing an unknown function over a compact and convex set using as few observations as possible. We observe that the optimization of the function essentially relies on learning the induced bipartite ranking rule of f. Based on this idea, we relate global optimization to bipartite ranking which allows to address problems with high dimensional input space, as well as cases of functions with weak regularity properties. The paper introduces novel meta-algorithms for global optimization which rely on the choice of any bipartite ranking method. Theoretical properties are provided as well as convergence guarantees and equivalences between various optimization methods are obtained as a by-product. Eventually, numerical evidence is given to show that the main algorithm of the paper which adapts empirically to the underlying ranking structure essentially outperforms existing state-of-the-art global optimization algorithms in typical benchmarks.
Ordinal Regression by Extended Binary Classification
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification.
Ordinal Regression by Extended Binary Classification
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier wouldthen bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification.
Pranking with Ranking
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering.
Pranking with Ranking
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering.
Pranking with Ranking
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyzeits performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms onlinealgorithms for regression and classification applied to ranking. 1 Introduction The ranking problem we discuss in this paper shares common properties with both classification and regression problems. As in classification problems the goal is to assign one of k possible labels to a new instance. Similar to regression problems, the set of k labels is structured as there is a total order relation between the labels. We refer to the labels as ranks and without loss of generality assume that the ranks constitute the set {I, 2, .. .